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ABSTRACT 

We present an algorithm that allows fast and efficient detection of transits, including planetary 
transits, from light-curves. The method is based on building an ensemble of fiducial models 
and compressing the data using the MOPED algorithm. We describe the method and demon- 
strate its efficiency by finding planet-like transits in simulated Pan-STARRS light-curves. We 
show that that our method is independent of the size of the search space of transit parameters. 
In large sets of light-curves, we achieve speed up factors of order of 10 8 times over the full 
%2 search. We discuss how the algorithm can be used in forthcoming large surveys like Pan- 
STARRS and LSST and how it may be optimized for future space missions like Kepler and 
COROT where most of the processing must be done on board. 
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1 INTRODUCTION 

If the orbit of a planet around a star is so favorably inclined that 
sin(i) w 1, the planet will transit the disk of the star once per orbit. 
During the transit the observed flux from the star is reduced by the 
ratio of the areas of the planet and the star, typically ~ 1% for a 
lupiter-like planet around a Sun-like star. When this photometric 
dimming is observed to repeat periodically, a small radius compan- 
ion may be i nferred to exist. This effect is seen in star HD209458 
ICharbonneau et alll2000h . which was first identified as a plane- 
tary system using the radial velocity technique. The added value of 
the detection of transits is great: not only is the sin(i) ambiguity 
resolved, but the radius of the planet may be inferred, and spectro- 
scopic examination of the object during transit allows the study of 
the atmosphere of the planet l Charbon neau et all2 002). 

The transit technique to search for planets has some advan- 
tages: photometry is less costly in telescope time than spectroscopy, 
and one knows sin(i) for all the systems found this way. The ma- 
jor disadvantage is that the yield is comparatively low, since only 
systems with sin(i) « 1 will be detected. 

A large number of transit searches for extra-solar planets, both 
space-base d and ground-based, have been completed or are under- 
way jGilrilandl200Ct iMocheiska et alj|200l lUdalski et al]|200l 
2003; Mallen-Ornelas et al. 2003). Many of these efforts employ 
small-aperture, wide-field cameras to monitor tens of thousands of 
nearby, bright stars. Of the surveys using this approach, the only 
success to date has come from the Trans-atlantic Exoplanet Sur- 
vey (TrES), which recently announced the discovery of a planet 
dubbed TrES-1 ( Alon so et alj|2004l) . The only other success (al- 
beit for fainter stars where follo w-up is mo r e dif ficult) has come 
from the OGLE survey iUdalski et ai]|2002U20o3) . The vast ma- 
jority of transits have been false detections resulting from graz- 
ing transits of stellar companions or a blend of an eclipsing binary 



with a brighter f oreground or background star iTorres et alj|2004 
IPont et alJl2005l) . S ome pro gress has been made at differentiating 
these from planets iHoekstra et al. 2005). However, the few candi- 
dates not eliminated by follow up studies, in particular OGLE-TR- 
56b with its 1.2-day orbital period, further challenge our already 
revised models of planet formation I Konacki et al. 2003). 

The detection of a weak, short, periodic transit in noisy light- 
curves is a challenging task. The large number of light-curves 
collected make automation and optimization processes a neces- 
sity. This requirement is even stronger in the context of space 
missions where much of the processing must be done on board. 
A number of transit-detection algorithms have been implemented 
in the literature <Dovle et all200(j:lDefav. Deleuil. & Bargel 200l|: 

iJl2002; 



Aigrain & Favatal 120021: Ijenkins et alJ l200l iKovacs et aljl2002_ 
Udals ki et all2002HStreet et all2003l) and there has been some ef- 
fort to compare their perspective performances I Tingl evl2003l) . 

Such transit searches are generally performed by comparing 
light-curves to a family of models with a common set of parame- 
ters: the transit period T, the transit duration 77, the epoch r (which 
is equal to the time t at the start of the first transit) and the tran- 
sit depth 9. The best set of parameters is identified by finding the 
model most likely to have given rise to the observed data, i.e. the 
model with the highest likelihood L. This is exactly the kind of 
problem MOPED (Heavens et al. 2000) was designed to address. 
In particular, light-curves contain plenty of redundant information: 
the light between transits. By using MOPED one can weigh more 
the part of the light-curve that is sensitive to the transit thus con- 
structing one eigenvector for each of the parameters in the transit 
model. However, for the case of transit detection in light-curves, 
the MOPED eigenvectors are sensitive to the fiducial model, thus 
incorrectly overweighs some data . In this paper we present a solu- 
tion to this problem by building an ensemble of fiducial models. We 
find that for each model in an ensemble of fiducial models, there are 
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many possible solutions. However, only one solution is common to 
all models in the ensemble of fiducial models: the one with the cor- 
rect parameter values of the transit. We construct a new statistical 
measure to determine for the set of fiducial models the correct value 
of the parameters for the transit. We also show that our algorithm 
passes the null test, i.e. it correctly identifies a light-curve with no 
transit. The set of fiducial models can be pre-computed and we pro- 
vide a recipe to do this. We show that this needs to be done only 
once before the search for transits is performed in a set of light- 
curves. 

The speed up in the analysis is significant. For a simulated 
light-curve typical of Pan-STARRS we find that our algorithm is 
10 8 times faster than a search in the full y2 space. The speed up is 
due to the fact that using MOPED the maximum likelihood search 
is performed on four data (the number of parameters) instead of 
thousands and that the ensemble of fiducial models can be pre- 
computed. This achieved increase in speed to compute the likeli- 
hood is important for transit analysis since the likelihood surface 
has multiple maxima, of which only one is the desired solution and 
therefore the search for this best solution needs to explore the whole 
likelihood surface 1 

This paper is organized as follows: In Section 2, we briefly 
describe MOPED. Section 3 presents the transit model used and 
how a set of synthetic light-curves were constructed. In Section 4, 
we describe the extension of MOPED using an ensemble of fiducial 
models and we also present how the results should be compared 
to the null hypothesis. Results are discussed in Section 5 and our 
conclusions summarized in Section 7. In Section 6, we describe the 
numerical topics including a numerical recipe. 



where the average is over an ensemble with the same parameters 
(Oa', Op) but different noise. The a posteriori probability for the pa- 
rameters is the likelihood, which for Gaussian noise is 
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The Fisher matrix gives a good estimate of the errors on the 
parameters, provided the likelihood surface is well described by a 
multivariate Gaussian near the peak. The method is strictly loss- 
less in this sense provided that the noise is independent of the pa- 
rameters, and provided our initial guess of the parameters is cor- 
rect. This is not exactly true because our initial guess is inevitably 
wrong. Howeve r, the increase in para meter errors is very small in 
these cases (see lHeavens et all l200Ch ) - MOPED recovers the cor- 
rect solutions extremely accurately even when the conditions for 
losslessness are not satisfied. The weights required are 
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2 MOPED 

We briefly review the parameter estimation and data compression 
metho d MOPED which is originally described in iHeavens et alJ 
( 2000). The method is as follows: given a set of data x (in our case 
a light-curve) which includes a signal part /x and noise n, i.e. 

x = /x + n, (1) 

the idea then is to find weighting vectors b m where m runs from 1 
to the number of parameters M, such that 

y m = b m x (2) 

contain as much information as possible about the parameters (pe- 
riod, duration of the transit etc.). These numbers y m are then used 
as the data set in a likelihood analysis with the consequent increase 
in speed at finding the best solution. In MOPED, there is one vector 
associat ed with each paramet er. 

In lHeavens et alJ f2000) an optimal and lossless method was 
found to calculate b m for multiple parameters (as is the case with 
transits). The definition of lossless here is that the Fisher matrix at 
the maximum likelihood point is the same whether we use the full 
dataset or the compressed version. The Fisher matrix is defined by: 

17 /a 2 ln£\ 



1 In surveys with high cadence and short observational period (e.g. TrES) 
the likelihood surface is smooth and methods utilizing smart searches of the 
likelihood surface are better suited. 



where a comma denotes the partial derivative with respect to the 
parameter m and C is the covariance matrix with components 
dj — {niUj). i and j runs from 1 to the size of the dataset. To 
compute the weight vectors requires an initial guess of the parame- 
ters. We term this the fiducial model (<f/) and we discuss in §|6|the 
impact on the MOPED solution on the choice of the fiducial model. 
For the case of transits the C does not depend on the parameters and 
therefore the b m depend only on the fiducial parameters (<f/). On 
the other hand the [i represents the signal part and thus depends on 
the free parameters, which we denote by q. 

The dataset {y m } is orthonormal: i.e. the y m are uncorrelated, 
and of unit variance. y m have means 

(y m ) = b m (g» • n{q) (7) 
The new likelihood is easy to compute, namely, 

ln£(0 a ) = constant - ^ (v™ ~ iv™)) 2 

m= 1 

= constant - ^ [h m (q f ) ■ x - h m (q f ) ■ n(q)] 2 

m 

(8) 

Further details are given in lHeavens et alj f2000). 

It is important to note that if the covariance matrix is known 
for a large dataset (e.g. a large synoptic survey) or it does not 
change significantly from light-curve to light-curve, then the (y m ) 
need be computed only once for the whole dataset, thus massively 
speeding up the computing of the likelihood. 
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Figure 1. Top panel: Synthetic light curve with transit signal; S/N ~ 5, 
period T = 1.3 days, transit duration of rj = 0.1 days, transit depth 9 ~ 
0.01. Bottom panel: Same synthetic light curve folded with the right period 
T = 1.3 days. The folded light-curve has not been used in any of the 
analysis and it is only shown here for demonstration. 



3 TRANSIT MODEL AND SYNTHETIC LIGHT-CURVES 
3.1 Transit model 

For the transit analysis we have constructed a model that closely 
represents the shape of a planetary transit light-curve. An obvious 
and usually chosen approach is to use a square wave: fj,(t) = — 1 — 
1 < t < 1 and fj,(t) — otherwise. However in order to allow 
for softer edges and being analytically differentiable we used the 
following function: 
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T is the period, r is the epoch, r\ is the transit duration 9 is the 
depth of the transit and c is a constant 2 . 

Applying the transit model to the MOPED framework, one 
needs to calculate the b weight vectors (eq. |6j, which depend on 
the derivatives of the model /x (the derivatives of eq.|5|with respect 
to four parameters T, 9, rj, r). These derivatives can be analytically 
calculated and thus computationally inexpensive since they do not 
require conditional statements. 



2 c controls the sharpness of the edges. We used c 
tions in this work 



100 for all calcula- 



3.2 Synthetic light-curves 

In order to test our method and estimate the gain in speed we cre- 
ated a sample of synthetic light-curves by setting the four free pa- 
rameters to realistic values and generating magnitudes according to 
eq.|5|with Gaussian noise added to better simulate real light-curves. 
We adjusted the Gaussian noise to achieve desirable signal-to-noise 
(S/N) values. 

We simulated observational sampling patterns from Pan- 
STARRS (one observation every 10 minutes, four times a month) 
and generated magnitudes as described in the following equation 



x{U; T, T), 9, r) = n(U; T, rj, 9, r) + m 



(11) 



where i* are the observational times and is a Gaussian noise 
obtained from Pan-STARRS photometric accuracy of 0.01 magni- 
tudes. Fig. Q top panel shows a typical synthetic light-curve with 
period 1.3 days and S/N=5. 



EXTENSION TO MOPED USING AN ENSEMBLE OF 
FIDUCIAL MODELS 



Unlike the case of galaxy spectra faeavens et aTE oOO). the fiducial 
model will weigh some data high, very erroneously if the fiducial 
model is way off from the true model. This is because the deriva- 
tives of the fiducial model with respect to the parameters are large 
near the walls of the box-like shape of the model. 

In this section we present an alternative approach to find the 
best fitting transit model to a light-curve. The method is based on 
using an ensemble of randomly chosen fiducial models. For an arbi- 
trary fiducial model the likelihood function (eq.|8j will have several 
maxima one of which is guaranteed to be the correct solution. This 
is the case where the values of the free parameters (q) are close to 
the true one; thus /i(q) in eq.|8|is similar to x. For a different ar- 
bitrary fiducial model there are also several maxima, but only one 
will be guaranteed to be a maximum, the true one. Therefore by us- 
ing several fiducial models one can eliminate the spurious maxima 
and keep the one that is common to all the fiducial models which 
is the true one. We combine the MOPED likelihoods for different 
fiducial models by simply averaging them 3 

The new measure Y is defined: 



Qf) 



(12) 



{<?/} 



where q and qf are the parameter vectors {T, r), 9, t} and their fidu- 
cial values {Tf ,rjf ,9 f ,Tf} and Nf is the number of fiducial mod- 
els. The summation is over an ensemble of fiducial models {qf}. 
£(<f; q f ) is the MOPED likelihood (eq.|SJ, i.e. 



£(<?; Qf) = tM?/) ■ x - b m(qf) ■ H{q)] 



(13) 



Fig. [3] shows the Y as a function of period T for a different size 
sets of fiducial models for a synthetic light-curve with S/N=3 and 
2000 observations. The top panel shows the value of Y using an 
ensemble of 3 fiducial models. As it can be seen from the figure 
there are more than few minima. Using an ensemble of 10 fiducial 
models (shown in the next panel) reduces the number of minima. In 

3 This is chosen ad hoc. We have tried other approaches all of which work 
similarly well. Averaging turned out to be the functional form in which, 
error and confidence level of the measurement, could be easily and analyti- 
cally calculated. 
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Figure 2. Top panel left: Likelihood as a function of period T. Top panel right: Likelihood as a function of transit duration r\. Bottom panel left: Likelihood 
as a function of 8 and bottom panel right : Likelihood as a function of r. In all parameters the correct value is found. Note that for T the topology of the 
Likelihood surface is fairly complicated with many local minima, thus making efficient minimization techniques not applicable. 



the last panel we used an ensemble of 20 fiducial models and there 
is only one obvious minimum, the true one. 

Fig.[2]shows the value of Y as a function of each free parame- 
ter for a synthetic light-curve. We set the values of 3 of the param- 
eters to the "correct" values (used to construct the light-curve) and 
we let the fourth free for each panel. Note that the shape of the Y as 
a function of r), 8 and r is smooth, however the dependency on T 
is erratic suggesting that efficient minimization techniques are not 
applicable. 



4.1 Confidence and error analysis 

To confidently determine that the minimum found is not spurious 
the likelihood of the candidate solution must be compared to the 
value and distribution of Y derived from a set of light-curves with 
no transit signal. One can simulate a set of null light-curves and 
build a distribution by calculating the value of Y for each point in 
the parameter space for each simulated "null" light-curve; a real 
expensive computational task. Alternatively this null distribution 
can be analytically derived. 

Since x ~ N({x), a x ) and all other variables are determinis- 
tic, then it can be shown that Y(q) follows a non-central X 2 dis- 
tribution Y(q) ~ X 2 (r,X) where r is the number of degrees of 
freedom and A is the non centrality of the distribution. The non- 



central X distribution has mean and variance according to: 



/' 

2 



r + A , 
o- = 2(r + 2A) , 

where r — 4 and A is given by 

\ _ E 2 [X] 
var [A?] 

The square of the expectation value is, 

E 2 [*]=£[(x> B m (q f )-C m (q;q f ) 

m 

where we define 

B m (q f ) = ^ 

f 

D m {q; qf) = b m (q f ) ■ fj.(q) 
and the variance is given by 



■ [X] — var 



b ™(qf) ■ x - b m (q f ) ■ n(q) 



m 



q f )\ 2 var [x 1 ] 



(14) 
(15) 



(16) 
(17) 

(18) 
(19) 



(20) 
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Figure 3. Y as a function of period T for a set of fiducial models for a 
synthetic light-curve with S/N=3 and 2000 observations and T = 1.3days. 
The top panel shows the value of Y using 3 randomly selected fiducial 
models, the middle panel 10 and the bottom using 20. As the number of 
fiducial models used increases the number of minima decreases. At Nt = 
20 there is only one obvious minima at T = 1.3 days. 



where we define /3 m (<f/) to be: 
(3 m (q f ) = b m (q f ) ■ b m (q f ) . 
Combining the above equations we get 



A = 



E m [(S) B m (q f )-D m ($q f )] 
<*% PmQtf) 



(21) 



(22) 



To compute confidence levels for a particular Y we integrate a non- 
central X 2 distribution with non centrality given by eg. 1221 from 
Y(q) to infinity. This is done numerically, still this is a very quick 
operation. Furthermore, as we will show in sec.|6|this will only be 
performed few times per light curve. 

Fig.|4]shows the values of Y(T) for the null case (i.e. a light- 
curve without a transit) both simulated (crosses) and theoretically 
calculated using the equations above (solid line is the expected 
value and dotted line is the 80% confidence level). It is clear that 
the simulated values agree well with the theoretical ones. Note that 
because the confidence can be calculated analytically we do not 
have to simulate null light-curves and recalculate the Y for each 
light-curve thus gaining computational speed. 
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Figure 4. Values of Y(T) for the null case (i.e. a light-curve without a 
transit) both simulated (crosses) and analytically calculated (see sec !4. II 
(solid line is the expected value and dotted line is the 67% confidence level). 
It is clear that the simulated values agree well with the theoretical ones. 



5 RESULTS 

Fig.|2|shows the results of likelihood as a function of each param- 
eter using a typical synthetic light-curve. The above searches were 
performed only in one parameter at the time, irregardless we suc- 
cessfully recover the true values for the parameters of the transit. 

In Fig. [5] we show the value of Y as a function of period for 
a synthetic light-curves with a transit of 1.28 days. The run was 
done using 40 fiducial models. The different panels show different 
values of S/N . The dotted line shows the 80% confidence level. 
For all 4 cases there is a well defined minimum at the right period, 
where the minimum is below the 80% level for S/N as low as 5 
andat71%forS/iV = 3. 

The more realistic case is to perform the search in the four pa- 
rameter space simultaneously and show that our method success- 
fully recovers the "correct" values of T, r\, 9 and r for a sample of 
synthetic light curves. This is shown in figures|S||7|and|8|where the 
2D projections of the four dimensional search are presented. The 
different contours correspond to 50%, 65% and 80% confidence 
levels. It is worth commenting the "multiple" maxima in the like- 
lihood. This feature also appears in the one-dimensional search: 
multiple minima appear at multiples of the true period, but note 
that the best fitting model is still the true period only (at the 50% 
confidence level the other solutions are excluded). This behavior is 
expected since when the period is allowed to be a multiple of the 
true one, one out of n (n is an integer) transits will fit and therefore 
will produce a better fit than the null case. These multiple solu- 
tions can be easily excluded by keeping the shortest period. This 
only occurs for T, the other parameters have only one well defined 
minimum at the true value. 



5.1 Application to multiple light-curves 

After having shown that our algorithm works properly on sev- 
eral synthetic light-curves, we now explore the performance of 
the method for a wide range of values for T, 8, r\ and r. In 
particular we have simulated light-curves for 0.1 < T < 4 
days; 1% < 77 < 5% of the period; < r < T and 
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Figure 6. Projection of the four-dimensional likelihood surface on the two 
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Figure 7. Same as fig.[S|but for parameters 9 and T. 



R P i&net/R* ~ 0.1 and 12 < V < 24. The observation frequency 
of the light-curve is similar to that of a Pan-STARRS light curve. 
This space parameter and observation frequency should cover the 
range of light transit observations expected from surveys like Pan- 
STARRS (http://pan-starrs.ifa.hawaii.edul and 
LSST jhttp : / / www . lsst . org} . 

We have simulated 100 light-curves with S/N = 5. For each 
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Fast transit identification 1 



50 



cn 40 r 
> 

d : 
o 

£ 30 - 



o 

I 20 r 
E : 

D 

c ; 



r , , , i , I , | | | | | I 

20 40 60 80 100 
Confidence in % 
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T for a set of 100 simulated light curves with S/N = 5 for the range of 
values of transit parameters described in S I5.1I . Two sets of 100 light curves 
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is 2000 (dotted line) and another with 4000 measurements (solid line). Note 
that for the 4000 measurements case most Ts are recovered with confidence 
higher than 80% and that for about 50% of the simulated light curves the 
confidence in recovering the true period is greater than 95%. 



light-curve we estimated the likelihood Y(q tIUC ) for the ensemble 
of fiducial models and then we calculated the confidence that this 
value is not a spurious detection. Fig. [5] shows the distribution for 
these confidence values. Two curves are plotted: the dotted line is 
for curves with a total number of 2000 observations (about 1 year). 
The thick solid line is for the case when the range is doubled to 
4000 measurements (2 years). For the higher number of observa- 
tions there is a significant increase in the confidence of recovering 
the true period. For this case most transits (80%) are found with 
confidence over the null case higher than 70%, i.e. for all galaxies 
the recovered period T has a confidence greater than 70% of being 
the correct one. In about 25% of the cases the confidence in recov- 
ering the true period is greater than 90%. For the case of 2000 mea- 
surements the success rate is somewhat lower. This is because the 
error on the estimated parameters depends on the number of obser- 
vations. For 9, Tj and r this depends on the number of observations 
in the transits. However, for the T this depends on the number of 
transits observed. One can show that the probability of observing 
a single transit is proportional to r//T, thought the probability of 
observing multiple transits is smaller. Furthermore, it also depends 
on the irregularity of the observational times (the more irregular the 
times the better the chance of recovering the signal). 



6 NUMERICAL METHOD 

The real advantage of the present method lies on the fact that for a 
set of light-curves most of the work can be done once in calculating 
the fiducial models. In this section we describe in more detail the 



numerical approach and we present the numerical gain over the 
brute force calculation using the full X 2 . 

6.1 Calculations of the fiducial models 

The second term of equation ^| does not depend on the actual 
light-curve data x. Therefore, D m (qf', q), &m(<?/) an d PmiSf) f° r 
each fiducial model can be pre-calculated once and stored in files. 
Thus for each light-curve we only need to calculate ^ t (<f/) x , 
which is independent of the search parameter q. This is a major ad- 
vantage of our method. Before we describe the numerical steps in 
more detail we need to address how we choose the fiducial models 
and how many fiducial models are needed. 



6.2 Choice of fiducial models 

There are three questions that we need to address about the choice 
of fiducial models. 

Number of fiducial models: Since the confidence level can be 
calculated at each iteration step, the number of fiducial models do 
not need to be pre-determined. If there is only one solution at confi- 
dence larger than 70% that parameter is considered to be the correct 
value and the iteration is stopped. Yet, for light-curves with low S/N 
the actual solution may never exceed that threshold. Therefore we 
imposed a maximum of 100 fiducial models. Besides, for a typical 
survey there will be only few expected light-curves containing a 
transit signal, thus for most cases the iteration will be terminated at 
the 100 fiducial model limit. 

Choice of fiducial parameters: We have found that the best 
performance in finding the true solution comes when we choose 
fiducial models from a flat distribution spanning the full range of 
the free parameters. 

Choice of search parameters q: Despite the fact that 
D m (qf-,q) will be calculated only once and it will not con- 
tribute to the overall computational burden of finding transits in a 
set of light-curves, the size of the database (files) that stores the 
fiducial model information heavily depends on the choice of the 
free parameters range and grid size. This is mostly important for 
future space missions where the memory available is limited. 

As it can be seen from Fig.|2|(top-left panel) finding the "cor- 
rect" period where there is a non linear relationship between likeli- 
hood and period is the most difficult task. This is due to the fact that 
a small change in the value of period T produces a huge variation 
at the tail of the light-curve. 

Theory suggests that the asymptotic standard deviation of the 
estimate of the period is of the order y -2 / 3 , so the grid should be 
that small too. We therefore performed the search on a uniform grid 
in frequency, T _1 , rather than on a uniform grid in T. There is also 
a related question of how fine the search grid should be for r\ and r . 
Since for data folded at period T, the folded observation times are 
roughly uniform, the average spacing of subsequent folded obser- 
vations is T/N (N is the total number of observations), T/N is a 
natural choice for the grid size for the r\ and r searches. For a typi- 
cal search the total number of searches can be as high as 10 9 which 
translates to 1TB of data. This is prohibiting for space missions. In 
what follows we examine how to further reduced the search space 
using physical and statistical arguments. 

Transit length range: For a given period one can allow r\ to 
take values between and T/2. This is a naive estimate based 
on the fact that the planet spends half of the time in front of the 
star. The range of rj can be further limited using geometrical argu- 
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ments and Kepler's law. It can be shown that the transit duration is 
JSacketJl999f) 



7) : 



cos 2 (i) 



(23) 



where i?* is the radius of the star, a is the orbit radius of the planet 
and i is the inclination angle. The maximum value that r\ can take 
is when the inclination angle i is zero. Using Kepler's law the ratio 
of duration over period is 

rj_ _ R_ 

T ~ 



T 2 GM* 
4^ 



1/3 



(24) 



For a typical main sequence star this yields a rj/T « 4% for pe- 
riods of 1-2 days. The fraction gets smaller as the period increases 
resulting a gain of factor of 50 in computational time (compared to 
the naive approximation rj/T ^ 1/2). 

Longest period: Equation [24] can be used to determine the 
longest period that can be recovered from the data. Namely at 
which period the transit duration over the period is small enough 
that the probability of observing more than few occultations is in- 
significant. It can be shown that for most inclinations the probabil- 
ity of observing an occultation is given by 

1 2tv a 

This is basically the probability that the occultation time and the 
observation time overlap. 

The probability of observing x occultations during the whole 
lifetime of the survey is given by a binomial distribution. At the 
limit where the number of observations is large, the probability dis- 
tribution becomes a Gaussian distribution 



1 <_ _ 

P t (x) = eXp 2a^ , 

V 2-7T G 

where p is given in eg, 125 1 The mean value is given by 

H = n t p , 

and the standard deviation 



a = ^n t p{l-p) . 



(26) 



(27) 



(28) 



where nt is the number of complete transits T T! ^ se /T. The proba- 
bility of observing at least 3 transits is therefore given by the inte- 
gral 



P(x > 3) 



\/27r cr 2 J3 



exp 2 dx 



(29) 



For a typical main sequence star a planet with a period of 20 days 
has a probability of observing 10 occultations that is less than few 
percent. Using that as the upper limit to our search reduces the num- 
ber of iterations by a factor 5-10. 

6.3 Numerical recipe 

The steps of the numerical method are described below 

(i) Select a set of fiducial models. The choice of the fiducial pa- 
rameters span the domain of the search parameters. 

(ii) Calculate D m (qf, q), b m (qf) and /3 m (<f/) for each fiducial 
model. The range and sampling frequency of the free parameters 
q are according to the physical arguments described above. Save 
values in a database (binary files) 

(iii) For each light-curve calculate ~}2 t &m(<7/) x± ■ 



(iv) Search through the fiducial models for D m (qf,q) with sim- 
ilar values as ~}2 t &m(<7/ ) x± from previous step. Note that since the 
database is sorted with respect to D, this is a log(N q ) operation 
where N q is the number of free parameter values. 

(v) Calculate Y for those parameters such that Ylt^m (<f/ ) x — 
D m (qf, q) is small. 

(vi) Compute the confidence level for the selected <f s using 
eg. 1221 Note that since D's, B's and f3's are pre-calculated we only 
need to compute (x). 

(vii) If there is only one minima with confidence level higher 
than 70% exit. 

(viii) If number of fiducial models is larger than 100, exit. 

(ix) Back to (iii). 



6.4 Required number of operations 

The brute force minimization for the likelihood function requires 

(30) 



~ N obs N, 



The number of operations for our method after the fiducial models 
are computed is 

MOPED+ 



N ohB iVfid 



(31) 



For a typical light-curve with low observing frequency like Pan- 
STARRS in the the four dimensional parameter space N q can easily 
be 10 10 . This number is large because of the non-linear dependence 
of the period to the likelihood, thus Nt ~ 100000 (see the argu- 
ments above). Contrary iVfld ~ 10 2 which means an improvement 
in speed of a factor of 10 8 . 



7 CONCLUSIONS 

We have presented a new algorithm to fast and efficiently detect 
transits in light-curves. Our algorithm does produce a major speed 
up factor in light transit searches, of about eight orders of magni- 
tude, compared to the brute force method using the full x 2 - This 
translates in finding a transit on a light-curve with 10 4 observations 
in well under a second on current desktop computers. We have de- 
veloped a four parameter model for the transit of an object and have 
shown, using synthetic light-curves, that our algorithm is success- 
ful at recovering the true parameters of the transit. We have simu- 
lated a set of light-curves with the sampling rate and photometric 
accuracy expected in large synoptic surveys like Pan-STARRS and 
shown that for a large range in the values of the parameters (T, rj, 
6, r) we recover the true values. For surveys like Pas-STARRS and 
LSST it should be possible to detect transits by Jovian planets and 
planets several times the size of earth. Since the expected detection 
rate of transits in this large surveys is very low, only one transit out 
of thousands light-curves, we believe that our method provides a 
fast and efficient algorithm to detect transits for future surveys. 
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